60 research outputs found

    A Bayesian network to analyse basketball players’ performances: a multivariate copula-based approach

    Get PDF
    Statistics in sports plays a key role in predicting winning strategies and providing objective performance indicators. Despite the growing interest in recent years in using statistical methodologies in this field, less emphasis has been given to the multivariate approach. This work aims at using the Bayesian networks to model the joint distribution of a set of indicators of players’ performances in basketball in order to discover the set of their probabilistic relationships as well as the main determinants affecting the player’s winning percentage. From a methodological point of view, the interest is to define a suitable model for non-Gaussian data, relaxing the strong assumption on normal distribution in favour of Gaussian copula. Through the estimated Bayesian network, we discovered many interesting dependence relationships, providing a scientific validation of some known results mainly based on experience. At last, some scenarios of interest have been simulated to understand the main determinants that contribute to rising in the number of won games by a player

    On the estimation of the Lorenz curve under complex sampling designs

    Full text link
    This paper focuses on the estimation of the concentration curve of a finite population, when data are collected according to a complex sampling design with different inclusion probabilities. A (design-based) Hajek type estimator for the Lorenz curve is proposed, and its asymptotic properties are studied. Then, a resampling scheme able to approximate the asymptotic law of the Lorenz curve estimator is constructed. Applications are given to the construction of (i) a confidence band for the Lorenz curve, (ii) confidence intervals for the Gini concentration ratio, and (iii) a test for Lorenz dominance. The merits of the proposed resampling procedure are evaluated through a simulation study

    Fuzzy clustering of spatial interval-valued data

    Get PDF
    In this paper, two fuzzy clustering methods for spatial intervalvalued data are proposed, i.e. the fuzzy C-Medoids clustering of spatial interval-valued data with and without entropy regularization. Both methods are based on the Partitioning Around Medoids (PAM) algorithm, inheriting the great advantage of obtaining non-fictitious representative units for each cluster. In both methods, the units are endowed with a relation of contiguity, represented by a symmetric binary matrix. This can be intended both as contiguity in a physical space and as a more abstract notion of contiguity. The performances of the methods are proved by simulation, testing the methods with different contiguity matrices associated to natural clusters of units. In order to show the effectiveness of the methods in empirical studies, three applications are presented: the clustering of municipalities based on interval-valued pollutants levels, the clustering of European fact-checkers based on interval-valued data on the average number of impressions received by their tweets and the clustering of the residential zones of the city of Rome based on the interval of price values

    Measuring Competitiveness at NUTS3 Level and Territorial Partitioning of the Italian Provinces

    Get PDF
    In this paper we propose a dashboard of indicators of territorial attractiveness at NUTS3 level in the framework of the EU Regional Competitiveness Index (RCI). Then, the Fuzzy C-Medoids Clustering model with multivariate data and contiguity constraints is applied for partitioning the Italian provinces (NUTS3). The novelty is the territorial level analized, and the identification of the elementary indicators at the basis of the construction of the eleven composite competitiveness pillars. The positioning of the Italian provinces is deeply analyzed. The clusters obtained with and without contraints are compared. The obtained partition may play an important role in the design of policies at the NUTS3 level, a route already considered by the Italian government. The analysis developed and the related set of indicators at NUTS3 level constitute an information base that could be effectively used for the implementation of the National Recovery and Resilience Plan (NRRP)

    Domino reaction for the sustainable functionalization of few-layer graphene

    Get PDF
    The mechanism for the functionalization of graphene layers with pyrrole compounds was investigated. Liquid 1,2,5-trimethylpyrrole (TMP) was heated in air in the presence of a high surface area nanosized graphite (HSAG), at temperatures between 80°C and 180°C. After the thermal treatments solid and liquid samples, separated by centrifugation, were analysed by means of Raman, Fourier Transform Infrared (FT-IR) spectroscopy, X-Rays Photoelectron Spectroscopy (XPS) and1H-Nuclear Magnetic Resonance (1H NMR) spectroscopy and High Resolution Transmission Electron Microscopy (HRTEM). FT-IR spectra were interpreted with the support of Density Functional Theory (DFT) quantum chemical modelling. Raman findings suggested that the bulk structure of HSAG remained substantially unaltered, without intercalation products. FT-IR and XPS spectra showed the presence of oxidized TMP derivatives on the solid adducts, in a much larger amount than in the liquid. For thermal treatments at T ≥ 150°C, IR spectral features revealed not only the presence of oxidized products but also the reaction of intra-annular double bond of TMP with HSAG. XPS spectroscopy showed the increase of the ratio between C(sp2)N bonds involved in the aromatic system and C(sp3)N bonds, resulting from reaction of the pyrrole moiety, observed while increasing the temperature from 130°C to 180°C. All these findings, supported by modeling, led to hypothesize a cascade reaction involving a carbocatalyzed oxidation of the pyrrole compound followed by Diels-Alder cycloaddition. Graphene layers play a twofold role: at the early stages of the reaction, they behave as a catalyst for the oxidation of TMP and then they become the substrate for the cycloaddition reaction. Such sustainable functionalization, which does not produce by-products, allows us to use the pyrrole compounds for decorating sp2 carbon allotropes without altering their bulk structure and smooths the path for their wider application

    Modelli grafici gerarchici e Item Response Theory : un’applicazione ai dati PISA 2006

    No full text
    I modelli dell’Item Response Theory (Lord e Novick 1968; Rasch 1960) sono una particolare classe di modelli matematico - probabilistici la cui diffusione è legata al crescente utilizzo, in particolare in ambito psicometrico e sociale, del questionario come strumento fondamentale per la misurazione di uno o più costrutti latenti. L’idea di base è quella di tradurre le informazioni, ottenute a partire dalle risposte osservate, in misurazioni oggettive del tratto latente, alla stregua di quanto avviene nelle scienze fisiche. Il modello di Rasch, tra i più noti nella classe dei modelli IRT, rispetta i criteri relativi al concetto di misura e presuppone che la probabilità di risposta corretta ad un item, da parte di un soggetto, sia funzione della differenza tra due parametri: l’abilità del soggetto e la difficoltà dell’item. In particolari contesti applicativi, quale quello dell’Educational Assessment, è frequente che tale classe di modelli venga applicata a matrici di dati incomplete, per le quali è ipotizzabile che il meccanismo generatore del dato mancante sia non ignorabile (Rubin 1976), determinando distorsioni nelle stime dei parametri dei modelli IRT. A tale proposito, il modello proposto da Holman e Glas (2005) tiene conto del meccanismo generatore del dato mancante considerando, oltre all’abilità, una seconda dimensione latente, la propensione alla risposta, la quale, in caso di dati MNAR (Missing Not At Random), è correlata alla prima dimensione. Obiettivo principale di questo lavoro di ricerca è stato quello di dimostrare che è possibile, nonché equivalente, definire il modello proposto da Holman e Glas utilizzando il linguaggio e le potenzialità dei modelli grafici (Lauritzen 1996) rendendo evidenti, mediante grafo diretto aciclico, le relazioni di indipendenza condizionata tra le variabili manifeste, e non, del modello. In particolare, si è definito un modello di Rasch bidimensionale, in ottica between items (Adams, Wilson e Wang 1997). L’approccio bayesiano ha arricchito, in termini di flessibilità, la rappresentazione grafica consentendo di attribuire, a tutti i nodi stocastici del modello, una distribuzione di probabilità a priori. La struttura dei dati, un campione italiano dell’indagine PISA 2006, ha suggerito l’introduzione di un ulteriore elemento di complessità: la caratteristica dei dati di essere “annidati” in livelli, o gerarchie, ha spinto la ricerca a considerare l’estensione dei modelli, prima descritti, al caso multilivello (secondo la formulazione che fa riferimento ai modelli GLAMM). L’analisi dei risultati è caratterizzata dal confronto tra le stime del modello bivariato (che tiene conto della correlazione tra abilità e propensione alla risposta) e quelle ottenute a partire da altri due modelli, l’uno che ha supposto un meccanismo MAR, l’altro che ha utilizzato una matrice completa in cui al posto dei dati mancanti è stato imputato lo zero (la risposta sbagliata). A corredo dell’analisi, si è aggiunto lo studio delle covariate con l’intento di valutare l’impatto delle stesse su entrambi i processi: l’obiettivo è valutare quali caratteristiche, legate allo studente e al suo background socio - familiare, influenzino le fasi dell’apprendimento e, in questo caso particolare, anche la strategia di risposta. In ultimo, per completare l’analisi e lo studio delle distorsioni sui parametri d’interesse, si è considerata l’analisi del Differential Item Functioning (DIF); lo scopo è, ancora, di comparazione tra i tre modelli al fine di verificare quali ripercussioni si possano avere sull’analisi del DIF qualora il meccanismo generatore del dato mancante non sia ignorabile. In generale, quindi, gli scopi della ricerca in questione sono stati molteplici e interconnessi; si sono combinate le proprietà dei modelli grafici, in ottica bayesiana, e quelle dei modelli IRT multilivello con l’ulteriore obiettivo di comparazione tra tre diversi approcci al trattamento dei dati mancanti. L’applicazione al campione dei dati italiani, considerando la stratificazione in gruppi regionali, ha reso più interessante e particolare lo studio confermando, anche in relazione a questa analisi, le forti differenze e sperequazioni, territoriali e sociali, che caratterizzano, da secoli, la nostra penisola

    A Kemeny Distance-Based Robust Fuzzy Clustering for Preference Data

    No full text
    We propose two robust fuzzy clustering techniques in the context of preference rankings to group judges into homogeneous clusters even in the case of contamination due to outliers or, more generally, noisy data. The two fuzzy C-Medoids clustering methods, based on the same suitable exponential transformation of the Kemeny distance, belong to two different approaches and differ in the way they introduce the fuzziness in the membership matrix, the one based on the “m” exponent and the other on the Shannon entropy. As far as the Kemeny distance is concerned, it is equivalent to the Kendall distance in the case of complete rankings but differs from the latter in the way of handling tied rankings. Simulations prove that our methods are able to recover the natural structure of the groups neutralizing the effect of possible noises and outliers. Two applications to real datasets are also provided

    PC Algorithm for Gaussian Copula Data

    No full text
    The PC algorithm is the most popular algorithm used to infer the structure of a Bayesian network directly from data. For Gaussian distributions, it infers the network structure using conditional independence tests based on Pearson correlation coefficients. Here, we propose two modified versions of PC, the R-vine PC and D-vine PC algorithms, suitable for elliptical copula data. The correlation matrix is inferred by means of the estimated structure and parameters of a regular vine. Simulation results are provided, showing the very good performance of the proposed algorithms with respect to their main competitors
    • …
    corecore